Conductivity experiments for electrolyte formulations and their automated analysis

Electrolytes are considered crucial for the performance of batteries, and therefore indispensable for future energy storage research. This paper presents data that describes the effect of the electrolyte composition on the ionic conductivity. In particular, the data focuses on electrolytes composed of ethylene carbonate (EC), propylene carbonate (PC), ethyl methyl carbonate (EMC), and lithium hexafluorophosphate (LiPF6). The mass ratio of EC to PC was varied, while keeping the mass ratio of (EC + PC) and EMC at fixed values of 3:7 and 1:1. The conducting salt concentration was also varied during the study. Conductivity data was obtained from electrochemical impedance spectroscopy (EIS) measurements at various temperatures. Based on the thus obtained temperature series, the activation energy for ionic conduction was determined during the analysis. The data is presented here in a machine-readable format and includes a Python package for analyzing temperature series of electrolyte conductivity according to the Arrhenius equation and EIS data. The data may be useful e.g. for the training of machine learning models or for reference prior to experiments.


Background & Summary
Electrolytes are crucial for the performance of batteries 1 since they enable shuttling of the ions, provide electrical isolation of the electrodes and have a defining influence on the formation and stability of the solid electrolyte interface (SEI) 2 and the cathode electrolyte interface (CEI) [2][3][4] . Achieving high performance electrolytes, typically requires the presence of various components like organic solvents, co-solvents, functional additives and conducting salts 5 . The concentration of each component and the ratio between the components have a strong impact on the conductivity of the electrolyte [6][7][8] . Ding et al. showed in several studies [6][7][8][9] , that the composition of the electrolyte, especially the PC content, affects the viscosity and glass transition temperature of the electrolyte. The amount of PC also hinders crystallization of EC 6,10 . This allows for the formulation of electrolytes with improved performance at low temperatures 10,11 .
The dataset 12 presented herein provides a comprehensive basis for future optimization studies, as it contains a wide variation of formulations and temperatures, including the raw data. Furthermore, it can help to gain deeper insights regarding composition-property-performance relationships. Fractions of this dataset served as the basis for several machine learning models published elsewhere 11,13,14 . The automated high-throughput experimentation system 13 available at the Helmholtz Institute Münster is used to formulate a variety of electrolyte solutions based on EC, EMC, PC and LiPF 6 . Ratios of (PC + EC):EMC of 3:7 and 1:1 are covered in the dataset 12 . The concentration of the conducting salt varies between 0.2 mol kg −1 and 2.1 mol kg −1 , while the ratio of EC:PC ranges from 0.0 to 9.2.
The robotic system 13 used for the acquisition of the data is able to dispense liquid and solid components into aluminium or polymer vials with high accuracy. Each formulation is identified by a batch number and measurements are identified by a unique ID stored and reported on the vial through a QR code. After sample-preparation, the automated setup performs the targeted measurement. Subsequently, the system returns a JSON formatted file for each formulation, which allows for downstream processing. Here, we present the data 12  www.nature.com/scientificdata www.nature.com/scientificdata/ raw data is time intensive, which is why we have developed an automated Python-based data analysis package called Modular and Autonomous Data Analysis Platform (MADAP) 15 with a command line interface (CLI) and a graphical user interface (GUI) that can process the aggregated CSV. This package is generalized and can be used on a variety of datasets as described below. The overall workflow of generating and analyzing data is shown in Fig. 1. All input parameters are tracked and saved in the output obtained from MADAP 15 to allow full data provenance tracking 16,17 of not just the experimental but also the data analysis steps in the research workflow 18 .
The dataset 12 can be used to train machine learning models in order to predict promising electrolyte formulations to reach an optimum conductivity, as demonstrated by Rahmanian et al. 11 . Further, the research community may find the data useful in the design of their own experiments and in decisions concerning the use of hardware, software and human resources. The use of this dataset together with analysis tools like MADAP 15 as a base for further lithium-ion battery research, enables the generation of further insights such as the activation energy of the ion conduction process. It is even possible to add other analysis procedures to MADAP 15 to further expand the automation it provides.

Methods
High throughput experimentation (HTE) system. The robotic HTE system 13 , used to acquire the data 12 presented here, is designed for high-throughput operation in a nitrogen atmosphere. The setup designed for the formulation of electrolyte solutions is able to prepare 96 formulations in 8 h by gravimetric dosing of solid and liquid materials into polymer or aluminium vials. Up to 10 mL of electrolyte can be formulated within one vial. The setup also provides functionalities to close the vials, mix, and heat their content using a heated shaker plate. Further, EIS measurements are performed automatically. To track the samples, each vial is automatically labelled using a QR code representing information like the date of preparation, an ID for the electrolyte mixture and information regarding the chemicals used. In preparation for EIS measurements, a volume of 750 μL of the electrolytes is automatically filled into single-use Eppendorf Ⓡ Safe-Lock Tubes with a capacity of 2 mL. The use of single-use equipment avoids cross contamination in this step of the process. Subsequently, electrodes are automatically immersed into the sample. These electrodes are designed to generate reproducible results independent of the shape of the vial or the depth of immersion 19 . For the measurement, the samples are arranged in groups of eight samples per rack, three of which are mounted on one larger rack. Four of these combined racks can be connected to the Metrohm Autolab potentiostat, which is used for the measurements 13 .

EIS measurement.
After the assembly of the racks, they are manually transferred to a Memmert TTC256 temperature chamber for EIS measurements. The connection of the cells to the Metrohm Autolab potentiostat is also done by the operator. The temperature chamber is programmed such, to cover the temperature ranges between −30 °C and 60 °C in steps of 10 °C. Subsequent to an equilibration period of 2 h for each temperature, the EIS measurements are automatically performed with an applied AC voltage of 40 mV and frequencies between 20 kHz to 50 Hz. A multiplexer distributes the output of twelve channels to eight outputs each. Hence, 96 channels are available to connect to each of the 96 cells on a rack 13 . Each experiment is repeated several times to provide up to 8 sets of values to the dataset. Repetitions can be identified and distinguished based on the running number in the experimentID.
Data management in the experimental setup. The data recording during the experimental workflow is handled by a laboratory information management system. It records identifiers for the starting materials, test www.nature.com/scientificdata www.nature.com/scientificdata/ protocols and relevant experimental parameters. Furthermore, the system is able to merge these data with metadata comprising further details about the electrolytes used in a measurement. After conclusion of a measurement, the collected data including the metadata is saved to a JSON file, which can be used for analysis.

Analysis Software (MADAP).
For the data analysis, a variety of tools are present and available, e.g. ZView 20 , pyEIS 21 , impedance 22 , Aftermath 23 and Origin 24 . We decided to bundle some of these tools into a compact, modular software package called MADAP 15 , thoroughly documented using sphinx 25 . This analysis tool provides all the necessary means to perform electrochemical data analysis based on experimental datasets, while providing full data provenance tracking, and plot publication quality results. It can perform a variety of automated electrochemical analyses, including EIS, linear and cyclic voltammetry and the analysis of temperature series according to the Arrhenius equation. In this paper, we focus on Arrhenius analysis and EIS measurements. MADAP 15 is deployed in Python3 and is publicly accessible as a GitHub repository (https://github.com/fuzhanrahmanian/MADAP) 15 , a pip installable package (pip install madap), and an executable (https://github.com/fuzhanrahmanian/ MADAP/releases/tag/v1.0.0) with a graphical user interface (GUI) created with PySimpleGui 26,27 , as shown in Fig. 2. The accessibility of MADAP 15 , by means of a CLI as well as a GUI, provides the broader scientific community with a variety of entry points for the data analysis. The generic nature of the procedure assures that the package can be expanded with further analysis methods without impacting the existing methodologies. Further, this enables its integration into autonomous research workflows [28][29][30] . The basic workflow of an analysis using MADAP 15 comprises the three steps of data acquisition, pre-processing and the analysis itself. In the former, the user can import different data types (.txt, .json, .hdf5 or .h5, .xml, .pkl and .csv) and select the data to be analyzed based on ranges of indices for rows and columns or by specifying column labels. The pre-processing step can detect outliers based on given upper and lower limits of the relevant quantile using the Quantile-based flooring and capping algorithm 31 . The user may choose to specify custom limits or use the default values implemented in MADAP 15 . In version 1.0, the default values are chosen as 0.01 for the lower and 0.99 for the upper limit. Afterwards, the user can choose what type of analysis shall be performed, i.e. voltammetry, EIS or Arrhenius. Figure 3 depicts the code structure used in MADAP 15 . In the beginning of each analysis, all the procedures instantiate an abstract class called EChemProcedure, which enforces the presence of methods called analyze, plot, save_data and perform_all_actions. All procedures additionally inherit from the common Plots class, which equips them with the common plotting functionalities, providing outputs with scientific format 32 . The complete procedure is continuously logged to review potential errors.
The linear fit required for the Arrhenius type analysis 33 is implemented in MADAP 15 using the functionalities for linear regression provided in the scikit-learn package 34 . The activation energy and the pre-exponential factor are derived from this fit. The regression loss, which is chosen as a quality metric, is calculated using the mean square error (MSE). Finally, plots and data files for the raw and fitted data as well as the model's parameters are automatically generated and saved in a designated location in accordance to the FAIR (Findability, Accessibility, Interoperability, and Reusability) data principle 35 .
EIS analysis and fitting are performed by a partial adoption of the impedance package provided by Matthew D. et al. 22 . In this package, the model uses a non-linear square fit as supplied by the SciPy 36 package. The www.nature.com/scientificdata www.nature.com/scientificdata/ EImpedance module of MADAP 15 gives the user the possibility to provide a definition of an equivalent circuit via available elements and their corresponding values. In this case, the user should provide guesses for the value of each element in the selected circuit. Based on these guesses, MADAP 15 generates a fit of the selected data internally and evaluates its quality. For the quality check, the root-mean-square error (RMSE) of the fit is determined and compared to the root mean square (RMS) of the experimental data. If the ratio of RMSE over RMS exceeds a threshold (δ), a re-evaluation will be triggered. In this case, the standard deviation of each estimated value of a circuit's element is added to or subtracted from the respective value to improve generalization. The operation to be carried out is selected randomly for each value. These new values are then used as the input guesses for the subsequent fit. This procedure is iterated, until either the ratio of RMSE and RMS is below δ, i.e. Equation 1 is fulfilled, or 5 iterations are reached. The number of iterations as well as δ are determined heuristically to 5 and 0.01, although the user will have the possibility to change them and define custom numbers as required.
Alternatively, MADAP 15 provides the option to iterate over 40 common hard-coded equivalent circuits, which are provided as part of the MADAP 15 package, without further user input. In this case, the match with the lowest RMSE will be chosen. This metric will be used as the loss metric in the analysis. For every impedance spectrum, the fitted circuit parameters and their uncertainties, the loss metric, the determined resistance and the corresponding conductivity will be saved automatically. To provide information about the linearity and stability of the fit, the improved linear Kramers-Kronig (linKK) method 37 as implemented in the impedance module 22 is applied automatically to each spectrum. For visualization, a Nyquist and a Bode plot comprising the raw and fitted data as well as a residual plot for the linKK method will be generated and saved accordingly. Figure 4b shows the data and the fit of randomly selected spectra corresponding to different quantiles of the RMSE to convey an impression of the achieved quality of the fit. For each quantile, four spectra and their respective fits are shown. For evaluation of the reliability of the fit, benchmarking is done referencing to the manual analysis of the selected data using Metrohm Autolab software as a baseline. In comparison to this baseline, MADAP 15 provides acceptable fits for the majority of the spectra. The same principle was applied for the Arrhenius analyses, depicted in Fig. 4a.

Data Records
The dataset 12 presented here comprises, among others, conductivity, real and imaginary part of the impedance as determined by EIS measurements and information regarding the formulation of a variety of electrolyte formulations for lithium-based batteries. The formulations relate to the masses of the solvent components EC, PC, and EMC and the conducting salt LiPF 6 .
We provide the dataset as a dataframe in a CSV file format, which can be dowloaded from https://doi. org/10.5281/zenodo.7244939 12 and may be used under the CC BY license. A summary of its structure is presented in Fig. 3 The stylized Unified Modelling Language (UML) diagram that represents the code structure of MADAP 15 . Table 1. This table also shows the data type, the range of values covered for each quantity, the number of unique values and the physical unit. In this section, we elucidate more on the data and the interrelations within the dataframe.
The robotic system operated at the Helmholtz Institute Münster outputs the raw data in JSON format. Although, this format is machine-readable, we decided to provide the data in CSV format, which can easily be read into the user's script as a table, e.g. using the Pandas 38 library available for Python. Each line in the dataframe represents all the data available for a single measurement. Parameters, which are shared by several experiments, are repeated in each line, where they are applicable. In the following, we will elucidate more on each column of the dataframe.  www.nature.com/scientificdata www.nature.com/scientificdata/ temperature. The temperature, at which each measurement was performed, is reported in this column. Each row corresponds to a measurement at one temperature. The values range from −30 °C to 60 °C. For five formulations, the measurement at −30 °C is not reported in the dataset.
frequency. This column reports a string, which comprises a list of the frequencies used in the EIS measurements.

EMC.
In this column, we report the mass of EMC used for the preparation of the electrolyte formulation. The values are given in g and comprise values between 0.480 g and 9.457 g.
LiPF_6. This column presents the mass in g of LiPF 6 comprised in the formulations. The values reach from 0.301 g to 4.093 g.

metadata.
In this column, additional information is reported, which cannot be reasonably presented in tabular form. The metadata are presented as a string of a dictionary. It reports the date and type of the experiment using the keys experimentDate and experimentType, respectively. Further, the version of the JSON format is associated with the key formatVersion. The number of the channel running the experiment, the amount of electrolyte used in the respective measurement, and the suspected measurement error are correlated with the keys channel, electrolyteAmount, and suspectedMeasurementError, respectively. The keys PC, EC, EMC, and LiPF 6 are linked to further information regarding the respective electrolyte component which is represented in dictionary format. The keys Batch-No, CAS-No, and comment present the respective information as a string. The date of delivery and the date of opening of the container are given as strings in the format MM/YY and can be accessed using the keys dateOfDelivery and dateOfOpening. The molar mass of the substance is reported as a float with the key molarMass, while its unit is given as a string using the key molarMassUnit. The name key is associated with a string stating the long name of the chemical. The purity of the material is found using the key purity, while the SMILES string is given with the key SMILES. Both of these quantities are reported as strings. The amount of the respective substance used in the formulation is accessed with the key substanceAmount, while the respective unit is found using the key substanceAmountUnit. Finally, the supplier key returns the supplier, from which the material was obtained.
Moreover, the dataframe also contains data resulting from the analysis of the experimental data using the MADAP 15 Python package. The MADAP 15 analysis workflow is performed on a Lenovo Workstation with an AMD Ryzen Threadripper PRO 3975WX processor at 3500 MHz with 32 cores and 64 Logical Processors. The workstation is equipped with 128 GB of RAM and an RTX A6000 GPU running with Microsoft Windows 10 Pro. The single core performance of the CPU turned out to be a bottleneck during operation, since the used libraries are not optimized for multicore processing or GPU training. Hence, MADAP 15 was configured to use all 32 cores for multithreaded operation for this scenario. In the following, we elucidate more on the analyzed results contained in the dataframe by going through the column names associated with analyzed data.
phase_shift. This column reports the phase shift (φ) or phase angle as obtained from the EIS analysis implemented in the MADAP 15 package according to Eq. 2: The data is given as a string of a list with values ranging from 0.131 to 89.882 given in°.
EIS_conductivity. The ionic conductivity obtained as the quotient of the cell constant and the resistance determined from the EIS analysis implemented in MADAP 15 is reported in this column. The conductivity is given in units of S cm −1 and the values range from 0.000 S cm −1 to 0.019 S cm −1 .
The data obtained from the analysis is verified using an appropriate metric for each analysis. For the Arrhenius type analysis, the quality of the fit is quantified by the mean squared error (MSE).
The impedance data reported in this work is pre-processed for analysis by excluding negative impedance values and outliers to enable a reliable analysis. The linKK method is used to verify the linearity of the spectrum and also reports the goodness of the fit by the statistical χ 2 value corresponding to the residual errors of the impedance data. Consequently, the resulting fit of the equivalent circuit is verified by means of the RMSE. This workflow returns the parameters corresponding to the equivalent circuit as presented in the section Data Records.
For visualization, we generated quantiles based on R 2 and RMSE for all the fits performed during the analyses. Figure 4 shows the results of four randomly selected analyses taken from each quantile to provide an overview of the distribution of the fitting quality. In Fig. 4a, fits corresponding to quantiles based on R 2 are shown, while Fig. 4b presents fits for quantiles based on RMSE. The first row in each subfigure gives an impression of the lowest fit quality, while the best fits are shown in the last row of the subfigures. Additionally, the conductivity and the activation energy calculated by MADAP 15 are depicted in Fig. 6.

Usage Notes
It is recommended to apply the MADAP 15 package to use, extend or adapt the provided data analysis. For performing analysis using the MADAP 15 package, a specific range of rows and columns of the dataframe can be selected. For example, to reproduce one of the result of this article for Arrhenius analysis, the published dataset was selected as input for the MADAP 15 GUI and the row indices from 3967 to 3977 and column 2 for temperatures and column 13 for electrolyte conductivity selected for the evaluation. Both plotting types were chosen, and the RUN button was pressed. Further results can be derived similarly.
Note that, in case a definition of the formulation in terms of molar fractions is desired, the amounts of substances for each component of the electrolyte as reported in the dictionary given in the column labelled metadata can be used.